Student-Teacher Learning for BLSTM Mask-based Speech Enhancement
نویسندگان
چکیده
Spectral mask estimation using bidirectional long short-term memory (BLSTM) neural networks has been widely used in various speech enhancement applications, and it has achieved great success when it is applied to multichannel enhancement techniques with a mask-based beamformer. However, when these masks are used for single channel speech enhancement they severely distort the speech signal and make them unsuitable for speech recognition. This paper proposes a studentteacher learning paradigm for single channel speech enhancement. The beamformed signal from multichannel enhancement is given as input to the teacher network to obtain soft masks. An additional cross-entropy loss term with the soft mask target is combined with the original loss, so that the student network with single-channel input is trained to mimic the soft mask obtained with multichannel input through beamforming. Experiments with the CHiME-4 challenge single channel track data shows improvement in ASR performance.
منابع مشابه
Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise
We address the speaker independent automatic recognition of spontaneous speech in highly variable noise by applying semisupervised sparse non-negative matrix factorization (NMF) for speech enhancement coupled with our recently proposed frontend utilizing bottleneck (BN) features generated by a bidirectional Long Short-Term Memory (BLSTM) recurrent neural network. In our evaluation, we unite the...
متن کاملRNN-BLSTM Based Multi-Pitch Estimation
Multi-pitch estimation is critical in many applications, including computational auditory scene analysis (CASA), speech enhancement/separation and mixed speech analysis; however, despite much effort, it remains a challenging problem. This paper uses the PEFAC algorithm to extract features and proposes the use of recurrent neural networks with bidirectional Long ShortTerm Memory (RNN-BLSTM) to m...
متن کاملImproving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection
For single-channel speech enhancement, mask learning based approach through neural network has been shown to outperform the feature mapping approach, and to be effective as a pre-processor for automatic speech recognition. However, its assumption that the mixture and clean reference must have the correspondent scale doesn’t hold in data collected from real world, and thus leads to significant p...
متن کاملSequence Student-Teacher Training of Deep Neural Networks
The performance of automatic speech recognition can often be significantly improved by combining multiple systems together. Though beneficial, ensemble methods can be computationally expensive, often requiring multiple decoding runs. An alternative approach, appropriate for deep learning schemes, is to adopt student-teacher training. Here, a student model is trained to reproduce the outputs of ...
متن کاملSpeech Enhancement using Adaptive Data-Based Dictionary Learning
In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques ...
متن کامل